Research design, Methods and Data Collection

Lecture 8. Quantitative methods 1: Foundations

Youssouf Merouani

2024-02-13

Structure of the lectures

  • 45 minute block – 10 minute break – 45 minute block
  • Last 5 minutes of class: Open for questions
  • Ask for clarifications throughout
  • Discussions and reflections

Content of the lectures

  • Lecture 8 – 13/02
    • Quantitative data
    • Univariate and Multivariate analysis in economics
    • Models and Research questions
  • Lecture 9 – 16/02
    • Content analysis and Surveys
    • Sampling and Imputation
  • Lecture 10 – 20/02
    • Identification and Causal Diagrams
    • Experiments and Quasi-experimental methods

Doing quantitative research requires the use of quantitative data.

What is quantitative data?

Two dimensions are useful to explain what it is, what it is not, and the gray areas.

Variable types

  • We quantify by defining variables that take the form of (different) numbers.
  • An instrument decides the measure (“number”) of a variable.
  • Levels (sophistication) of measurement explain the inherent order of the variable.

Categorical variables

  • Entities are divided into distinct categories:
    • Binary variable: only two categories (e.g., dead or alive)
    • Nominal variable: more than two categories (e.g., omnivore, vegetarian, vegan, or fruitarian).
    • Ordinal variable: Nominal with logical order (e.g., grades in A, B, C, D, E, F)

Continuous variables

  • Entities get a distinct score:
    • Interval variable: Equal intervals on the variable represent equal differences in the property being measured (e.g., the difference between 6 and 8 is equivalent to the difference between 13 and 15).
    • Ratio variable: Interval but with a meaningful “0”, which gives meaningful ratios (e.g., score 16 on an anxiety scale means that the person is twice as anxious as someone with an 8)

What do you do with these variables?

You have two roads:
univariate analysis or multivariate analysis.


Univariate Analysis: Exploring Data One Variable at a Time

  • Definition: Analyzing one variable at a time to produce descriptive statistics, giving data meaning through numerical summaries.
  • Common Tools:
    • Frequency Tables: Show how often each value occurs; include counts and percentages.
    • Diagrams: Bar charts, pie charts, histograms, and infographics display data visually.
    • Measures of Central Tendency: Mean, median, and mode identify typical values within data.
    • Measures of Dispersion: Range and standard deviation measure variability among data.
    • Boxplots: Visual summaries showing both central tendency (median) and dispersion (range), highlighting outliers.

When to use univariate analysis?

  • As part of a multivariate analysis to provide an understanding of your variables.
  • As part of a qualitative study to provide additional context (turns the study into mixed methods if used extensively)
  • In its own right as an explorative study with new data (example: you have constructed a new and impressive dataset about a phenomenon we previously didn’t know much about)

Example 1: Evolution of patenting activity

010002000300040005000600070008000900010000110001200013000140000204060801001201401601802002202401791179517991803180718111815181918231827183118351839184318471851185518591863186718711875187918831887189118951899TOTALWOMENWomen-linked PatentTotal

Women-linked Patents: 1791-1900, France. Source: (Merouani and Perrin, 2024)

Example 1: Evolution of patenting activity

010002000300040005000600070008000900010000110001200013000140000204060801001201401601802002202401791179517991803180718111815181918231827183118351839184318471851185518591863186718711875187918831887189118951899TOTALWOMENWomen-linked PatentTotal1844 LawFrenchRevolution of1848Franco-PrussianWar

Women-linked Patents: 1791-1900, France. Source: (Merouani and Perrin, 2024)

Example 2: How does the gender patenting gap evolve over time?

0,000,400,801,201,602,002,402,803,203,604,004,404,805,205,606,001791179517991803180718111815181918231827183118351839184318471851185518591863186718711875187918831887189118951899Gender gapAverage

Gender Patenting Gap: 1791-1900, France. Source: (Merouani and Perrin, 2024)

Example 3: How does the connectivity of inventors evolve?

2023-11-25T15:02:10.612992 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Average degree evolution for two networks. Source: (Merouani, 2024)

Example 4: How does collaboration differ across sectors and gender?

2023-11-25T18:16:34.426048 image/svg+xml Matplotlib v3.7.0, https://matplotlib.org/

Average Degree Per IPC Sector. Size of the circles represent the relative share of connections within that sector compared to others. Source: (Merouani, 2024)

Studying economic phenomena from a multivariate perspective? Look into econometrics!

Econometrics

  • Econometrics: Utilizes statistical methods to uncover relationships between economic variables.
  • While key in predicting macroeconomic variables (interest rates, inflation, GDP), with applications extending to politics, education, and beyond.
  • Purpose is not to predict, but to understand relationships underlying economic phenomena.
  • Addresses the challenges of working with nonexperimental data (i.e., observation data) through ceteris paribus.

UFO sightings in New Mexico correlates with Patents granted in the US UFO sightings reported in New Mexico · Source: National UFO Reporting Center Total number of patents granted in the US · Source: USPTO 1975-2020, r=0.937, r²=0.878, p<0.01 · tylervigen.com/spurious/correlation/3099 UFO sightings Patents granted 109 391K 82 306.4K 55 222K 28 137.1K 1 52K 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020

UFO and patenting activity

Avoid the pitfals of spurious relationships 😊

How?
Quantitative research is deductive.
So, start from theory!

… or just simple "economic understanding" 🤯

What do you want to explain?

Y X
Dependent variable Independent variable
Explained variable Explanatory variable
Response variable Control variable
Predicted variable Predictor variable
Regressand Regressor

Example of Job Training and Worker Productivity

  • Examine the effects of job training on worker productivity.
  • Discuss with your classmates the following:
    • How can we measure training?
    • How can we measure worker productivity?
    • What other things do we need to control for? (ceteris paribus)

Example: Job Training and Worker Productivity

  • Job Training -> weeks/hours spent in job training 🤸‍♀️

  • Worker Productivity -> wage (hourly wage) 💰

  • (other) Big things that affect productivity:

    • Education: years of formal education 👩‍🎓
    • Experience years of working experience 🏢
  • We add an error term that captures “all other things” 🤷‍♀️

  • 💰 = 👩‍🎓 + 🏢 + 🤸‍♀️ + 🤷‍♀️

\[ \text{wage} = \beta_0 + \beta_1 \text{educ} + \beta_2 \text{exp} + \beta_3 \text{training} + u \]

Reflection: Why did we choose a regression model for that RQ?

Example: Explain Women’s patenting

\[ Pr(Y_i = 1) = \Phi\left(\beta_0 + \beta_1 \cdot \text{Year1844}_i + \beta_2 \cdot \text{PatentLength}_i + \beta_3 \cdot \text{TeamSize}_i + \sum_{j=1}^{J} \beta_{4j} \cdot \text{Sector}_{ji} + \sum_{k=1}^{K} \beta_{5k} \cdot \text{Education}_{ki} + \varepsilon_i\right) \]

\(Y_i = 1\) indicates women-linked patent

  • Financial resources
  • Team size
  • Female oriented sectors
  • Human capital

Good RQ bridge theory and reality.

Crafting a Good Research Question (1)

  • Essential Criteria:
    • Answerable: Must be possible to answer with empirical evidence.
    • Informative: Should enhance understanding of how the world works.
  • Key Components:
    • Clarity: Avoids ambiguity, allowing for specific evidence to answer.
    • Relevance: Connects to broader theories or explanations of phenomena.
    • Feasibility: Data necessary for answering the question can realistically be obtained.
  • From Theory to Hypothesis: A good question bridges theory (why things happen) to hypothesis (what specifically will be observed).

Crafting a Good Research Question (2)

  • Where we find them: Curiosity and oppotunity.
  • Evaluating Questions:
    • Does it provide new insights?
    • Can unexpected results challenge or refine our understanding?

References

Clark, T., Foster, L., Bryman, A., and Sloan, L. 2021. Bryman’s Social Research Methods, Oxford University Press
Huntington-Klein, N. 2022. The effect: An introduction to research design and causality, Boca Raton, CRC Press, Taylor & Francis Group
Merouani, Y. and Perrin, F. 2024. Women Inventors: On the Origins of the Gender Patenting Gap, Lund Papers in Economic History, vol. 255
Wooldridge, J. M. 2020. Introductory econometrics: A modern approach, Boston, MA, Cengage Learning